Search CORE

106 research outputs found

An investigation into feature effectiveness for multimedia hyperlinking

Author: G.J.F. Jones
L. Lamel
M. Bron
M. Eskevich
O. Alonso
R.-E. Fan
Publication venue
Publication date: 01/01/2014
Field of study

The increasing amount of archival multimedia content available online is creating increasing opportunities for users who are interested in exploratory search behaviour such as browsing. The user experience with online collections could therefore be improved by enabling navigation and recommendation within multimedia archives, which can be supported by allowing a user to follow a set of hyperlinks created within or across documents. The main goal of this study is to compare the performance of dierent multimedia features for automatic hyperlink generation. In our work we construct multimedia hyperlinks by indexing and searching textual and visual features extracted from the blip.tv dataset. A user-driven evaluation strategy is then proposed by applying the Amazon Mechanical Turk (AMT) crowdsourcing platform, since we believe that AMT workers represent a good example of "real world" users. We conclude that textual features exhibit better performance than visual features for multimedia hyperlink construction. In general, a combination of ASR transcripts and metadata provides the best results

Crossref

Irish Universities

DCU Online Research Access Service

A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

Author: Adda G.
Adda-Decker M.
Benjumea J.
Besacier L.
Cooper-Leavitt J.
Godard P.
Kouarata G-N.
Lamel L.
Maynard H.
Mueller M.
Rialland A.
Stueker S.
Yvon F.
Zanon-Boito M.
Publication venue
Publication date: 15/02/2018
Field of study

Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources or stable orthography. Systems constructed under these almost zero resource conditions are not only promising for speech technology but also for computational language documentation. The goal of computational language documentation is to help field linguists to (semi-)automatically analyze and annotate audio recordings of endangered and unwritten languages. Example tasks are automatic phoneme discovery or lexicon discovery from the speech signal. This paper presents a speech corpus collected during a realistic language documentation process. It is made up of 5k speech utterances in Mboshi (Bantu C25) aligned to French text translations. Speech transcriptions are also made available: they correspond to a non-standard graphemic form close to the language phonology. We present how the data was collected, cleaned and processed and we illustrate its use through a zero-resource task: spoken term discovery. The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.Comment: accepted to LREC 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Heterophonic speech recognition using composite phones

Author: CJ Leggetter
DL Hinton
F Jelinek
GE Dahl
H Soltau
JP Olive
K Kirchhoff
L Lamel
M Abushariaha
T Demeechai
Y El-Imam
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Towards a multimedia knowledge-based agent with social competence and human interaction capabilities

Author: André E
Blat J
Dasiopoulou S
Domínguez M
Kamateri E
Kompatsiaris I
Lamel L
Lingenfelser F
Llorach G
Mehlmann G
Mille S
Minker W
Pragst L
Stam A
Stellingwerff L
Sukno F
Ultes S
Vieru B
Vrochidis S
Wanner L
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

We present work in progress on an intelligent embodied conversation agent in the basic care and healthcare domain. In contrast to most of the existing agents, the presented agent is aimed to have linguistic cultural, social and emotional competence needed to interact with elderly and migrants. It is composed of an ontology-based and reasoning-driven dialogue manager, multimodal communication analysis and generation modules and a search engine for the retrieval of multimedia background content from the web needed for conducting a conversation on a given topic.The presented work is funded by the European Commission under the contract number H2020-645012-RIA

OPUS Augsburg

Crossref

UPF Digital Repository

CUED - Cambridge University Engineering Department

Using group delay functions from all-pole models for speaker recognition

Author: Alku Paavo
Bimbot F.
Cerisara C.
Fougeron C.
Gravier G.
Kinnunen Tomi H.
Lamel L.
Pellegrino F.
Perrier P.
Pohjalainen Jouni
Rajan Padmanabhan
Publication venue: Isc-Int Speech Communication Association
Publication date: 01/01/2013
Field of study

Bu çalışma, 25-29 Ağustos 2013 tarihlerinde Lyon[Fransa]'da düzenlenen 14. Annual Conference of the International Speech Communication Association [Interspeech 2013]'da bildiri olarak sunulmuştur.Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduce. A useful representation of the phase is the group delay function, but its robust computation remains difficult. This paper advocates the use of group delay functions derived from parametric all-pole models instead of their direct computation from the discrete Fourier transform. Using a subset of the vocal effort data in the NIST 2010 speaker recognition evaluation (SRE) corpus, we show that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort. Additionally, the group delay features provide comparable or improved accuracy over conventional magnitude-based MFCC features. Thus, the use of group delay functions derived from all-pole models provide an effective way to utilize information from the phase spectrum of speech signals.Academy of Finland (253120)Int Speech Commun AssociationAmazonMicrosoftGoogleTcL SYTRALEuropean Language Resources AssociationOuaeroImaginoveVOCAPIA ResearchAcapelaSpeech OceanALDEBARANOrangeVecsysIBM ResearchRaytheon BBN TechnologyVoxyge

Açık Erişim@BUU

Speech Communication

Author: Allen Jonathan
Alwan Abeer A.
Anderson M.
Bickley Corine A.
Carrell T.
Chapin Ringo Carol
Cohen M.
Daly Nancy
Delgutte Bertrand
Dubois S.
Espy-Wilson Carol Y.
Glass James R.
Goldhor R.
Greene B.
Halle Morris
Hillman Robert E.
Holmberg Eva B.
Huang Caroline B.
Kassel Rob
Kaufman D. H.
Kawasaki Haruko
Keyser Samuel J.
Klatt Dennis H.
Lamel Lori
Larkey Leah S.
Lauritzen N.
Leung Hong
Locke John L.
Makhoull J. I.
Manuel Sharon Y.
Menyuk Paula
Miller J. L.
Pastel L.
Perkell Joseph S.
Pisoni David B.
Pitrelli John
Price P. J.
Randolph Mark A.
Seneff Stephanie
Shattuck-Hufnagel Stephanie
Stevens Kenneth N.
Wheeler L.
Wilson T.
Yoshida K.
Zue Victor W.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 01/01/1987
Field of study

Contains reports on five research projects.C.J. Lebel FellowshipNational Institutes of Health (Grant 5 T32 NS07040)National Institutes of Health (Grant 5 R01 NS04332)National Science Foundation (Grant 1ST 80-17599)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0254)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0341)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0290

DSpace@MIT

Emergence of linguistic laws in human voice

Author: A Clauset
A Corral
A Corral
A Romberg
AS Park
B Corominas-Murtra
B Mandelbrot
B McCowan
C Kello
C Kello
C Langton
DJ Schwab
F Font-Clos
F Font-Clos
F Font-Clos
F Lamel
H Brumm
I Moreno-Sanchez
J Baixeries
J Gillooly
J Glass
J Luque
J Saffran
J Saffran
L Doyle
L Egghe
L Emberson
L Ha
L Lü
M Aylett
M Bunge
M Gerlach
M Gustison
M Nowak
M Tyler
M van Egmond
N Chater
N Evans
O Peters
P Kuhl
P Kuhl
P Kuhl
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer-i Cancho
S Greenberg
S Piantadosi
ST Piantadosi
T Crystal
T Drugman
T Fitch
T Nabeshima
W Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/10/2016
Field of study

Submitted for publicationSubmitted for publicatio

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

PubMed Central

Queen Mary Research Online

Speech Communication

Author: Abramson Katie
Allen Jonathan
Alwan Abeer A.
Anderson M.
Bateman Nicholas P. T.
Bickley Corine A.
Blush M.
Boyce Suzanne E.
Chapin Ringo Carol
Chasaide A. ni
Cohen M.
Daly Nancy
Dubois S.
Espy-Wilson Carol Y.
Forestell Ann F.
Glass James R.
Glicksman Laura B.
Goldhor R.
Gosy M.
Halle Morris
Hillman Robert E.
Hirahara T.
Hirose K.
Holmberg Eva B.
Hopkins G.
Howitt Andrew William
Huang Caroline B.
Isaacs Katy
Jankowski Charles
Kassel Rob
Kaufman D. H.
Kawasaki Haruko
Key K. K.
Keyser Samuel J.
Klatt Dennis H.
Kline K.
Lamel Lori
Landry Joseph
Lane Harlan L.
Larkey Leah S.
Lauritzen N.
Leung Hong
Lim A.
Locke John L.
Makhoul J. I.
Manuel Sharon Y.
Marcus J. N.
McCandless Michael K.
Mitra H.
North Keith
Pao Christine
Pastel L.
Perkell Joseph S.
Phillips M.
Pitrelli John
Randolph Mark A.
Seneff Stephanie
Shattuck-Hufnagel Stephanie
Shaw A.
Stevens Kenneth N.
Suzuki Noriko
Tierney S.
Volaitis L.
Webster Jane W.
Wheeler L.
Whitney Dave
Wilson T.
Wint Arlene E.
Wong Albert K.
Wong D.
Yoshida K.
Zue Victor W.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains reports on five research projects.C.J. Lebel FellowshipNational Institutes of Health (Grant 5 T32 NSO7040)National Institutes of Health (Grant 5 R01 NS04332)National Institutes of Health (Grant 5 R01 NS21183)National Institutes of Health (Grant 5 P01 NS13126)National Institutes of Health (Grant 1 PO1-NS23734)National Science Foundation (Grant BNS 8418733)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0254)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0341)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0290)National Institutes of Health (Grant RO1-NS21183), subcontract with Boston UniversityNational Institutes of Health (Grant 1 PO1-NS23734), subcontract with the Massachusetts Eye and Ear Infirmar

DSpace@MIT

Speech Communication

Author: Abramson Katie
Allen Jonathan
Alwan Abeer A.
Bateman Nicholas P. T.
Bickley Corine A.
Boyce Suzanne E.
Chapin Ringo Carol
Daly Nancy
Espy-Wilson Carol Y.
Forestell Ann F.
Furtado Xavier
Glass James R.
Glicksman Laura B.
Goldhor Richard S.
Hall Seth M.
Halle Morris
Hillman Robert E.
Hirahara Tatsuya
Holmberg Eva B.
Howitt Andrew William
Huang Caroline B.
Ihionu Peter
Isaacs Katy
Jankowski Charles
Kassel Rob
Kawasaki Haruko
Kennedy Fred G.
Keyser Samuel J.
Klatt Dennis H.
Kuru Tunay
Lamel Lori
Lane Harlan L.
Larkey Leah S.
Leung Hong
Locke John L.
Makhoul John I.
Manuel Sharon Y.
Marcus Jeff
Matelli Joan
McCandless Michael K.
Menard Hope
Meng Helen
Mitra Haruko
North Keith
Ono Shigeru
Palay Vicky
Pao Christine
Perkell Joseph S.
Phillips Michael
Pitrelli John
Randolph Mark A.
Seneff Stephanie
Shattuck-Hufnagel Stephanie
Shaw Andy
Stevens Kenneth N.
Suzuki Noriko
Svirsky Mario A.
Takeda Kasuya
Webster Jane W.
Whitney Dave
Wilde Lorin F.
Wong Davin
Zue Victor W.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents for Part IV, table of contents for Section 1 and reports on five research projects.Apple Computer, Inc.C.J. Lebel FellowshipNational Institutes of Health (Grant T32-NS07040)National Institutes of Health (Grant R01-NS04332)National Institutes of Health (Grant R01-NS21183)National Institutes of Health (Grant P01-NS23734)U.S. Navy / Naval Electronic Systems Command (Contract N00039-85-C-0254)U.S. Navy - Office of Naval Research (Contract N00014-82-K-0727

DSpace@MIT